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Abstract 

Differential geometry (DG) based solvation models are a new class of variational implicit solvent approaches that are 
able to avoid unphysical solvent-solute boundary definitions and associated geometric singularities, and dynamically couple 
polar and nonpolar interactions in a self-consistent framework. Our earlier study indicates that DG based nonpolar solvation 
model outperforms other methods in nonpolar solvation energy predictions. However, the DG based full solvation model has 
not shown its superiority in solvation analysis, due to its difficulty in parametrization, which must ensure the stability of the 
solution of strongly coupled nonlinear Laplace-Beltrami and Poisson-Boltzmann equations. In this work, we introduce new 
parameter learning algorithms based on perturbation and convex optimization theories to stabilize the numerical solution 
and thus achieve an optimal parametrization of the DG based solvation models. An interesting feature of the present DG 
based solvation model is that it provides accurate solvation free energy predictions for both polar and nonploar molecules 
in a unified formulation. Extensive numerical experiment demonstrates that the present DG based solvation model delivers 
some of the most accurate predictions of the solvation free energies for a large number of molecules. 
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I Introduction 

Biological processes, such as signaling, gene regulation, transcription, translation, et cetera govern the cell growth, cellular 
differentiation, fermentation, fertilization, germination, etc. in living organisms. Chemical processes, such as oxidation, re¬ 
duction, hydrolysis, nitrification, polymerization, and so forth underpin biological processes. Physical processes, particularly 
solvation, are involved in all the aforementioned chemical and biological processes. Therefore, a prerequisite for the under¬ 
standing of chemical and biological processes is to study the solvation process. As a physical process, solvation does not 
involve the formation and/or breaking of any covalent bond, but is associated with solvent and solute electrostatic, dipolar, 
induced dipolar, and van der Waals interactions. 

Experimentally, solvation can be analyzed by the measurement of solvation free energies. Theoretically, solavtion can 
be investigated by quantum mechanics, molecular mechanics, integral equation, implicit solvent models, and simple phe¬ 
nomenological modifications of Coulombs law. Among, the implicit solvent models are known to balance the computational 
complexity and the accuracy in the solvation free energy prediction, and thus, offer an efficient approach. 

The general idea of implicit solvent models is to treat the solvent as a dielectric continuum and describe the solute in atom¬ 
istic detail. 23 ’ 41 ’ 43 ’ 59 ’ 62 The total solvation free energy is decomposed into nonpolar and polar parts. There is a wide variety 
of ways to carry out this decomposition. For example, nonpolar energy contributions can be modeled in two stages: the work 
of displacing solvent when adding a rigid solute to the solvent and the dispersive nonpolar interactions between the solute 
atoms and surrounding solvent. The polar part is due to the electrostatic interactions and can be approximated by general¬ 
ized Born (GB), 2 ’ 12 ’ 24 ’ 31 ’ 36 ’ 44 ’ 51 ’ 56 ’ 66 ’ 68 ’ 87 polarizable continuum (PC) 67 and Poisson-Boltzmann (PB) models. 1 ’ 23,29 ’ 46 ’ 62 ’ 84 ’ 86 
Among them, GB models are heuristic approaches to polar solvation energy analysis. PC models resort to quantum me¬ 
chanical calculations of induced solute charges. PB methods can be formally derived from Maxwell equations and statistical 
mechanics for electrolyte solutions 7,40 ’ 52 and therefore offer the promise of handling large biomolecules with sufficient ac¬ 
curacy and robustness. 2 ’ 22,55 

Conceptually, the separation between continuum solvent and the discrete (atomistic) solute introduces an interface def¬ 
inition. This definition may take the form of analytic functions 34-36 or nonsmooth boundaries dividing the solute-solvent 
domains. The van der Waals surface, solvent accessible surface, 47 and molecular surface (MS) 58 are devised for this pur¬ 
pose and have found their success in biophysical calculations. 8 ’ 20 ’ 27 ’ 42,45 ’ 48 ’ 49 ’ 63 It has been noticed that the performance 
of implicit solvent models is very sensitive to the interface definition. 25,26 ’ 54 ’ 64 This comes as no surprise because many 
of these popular interface definitions are ad hoc divisions of the solute and solvent domains based on rigid molecular ge¬ 
ometry and neglecting solute-solvent energetic interactions. Additionally, geometric singularities 18,60 associated with these 
surface definitions incur enormous computational instability 76,77,86 and lead to conceptual difficulty in interpreting the sharp 
interface. 15 

The differential geometry (DG) theory of surfaces 75 and associated geometric partial differential equations (PDEs) provide 
a natural description of the solvent-solute interface. In 2005, Wei and his collaborators introduced curvature-controlled PDEs 
for generating molecular surfaces in solvation analysis. 73 The first variational solvent-solute interface, namely, the minimal 
molecular surface (MMS), was constructed in 2006 by Wei and coworkers based on the DG theory of surfaces. 4-6 MMSs 
are constructed by solving the mean curvature flow, or the Laplace-Beltrami flow, and have been applied to the calculation 
of electrostatic potentials and solvation free energies. 6,16 This approach was generalized to potential-driven geometric 
flows, which admit physical interactions, for the surface generation of biomolecules in solution. 3 While our approaches 
were employed and/or modified by many others 17,79-81 for molecular surface and solvation analysis, our geometric PDE 73 
and variational surface models 3,4,6 are, to our knowledge, the first of their kind for solvent-solute interface and solvation 
modeling. 

Since the surface area minimization is equivalent to the minimization of surface free energies, due to a constant surface 
tension, this approach can be easily incorporated into the variational formulation of the PB theory 33,61 to result in DG- 
based full solvation models, 13,70 following a similar approach by Dzubiella ef a/. 28,83 Our DG-based solvation models have 
been implemented in the Eulerian formulation, where the solvent-solute interface is embedded in the three-dimensional 
(3D) Euclidean space and behaves like a smooth characteristic function. 13 The resulting interface and associated dielectric 
function vary smoothly from their values in the solute domain to those in the solvent domain and are computationally robust. 
An alternative implementation is the Lagrangian formulation 14 in which the solvent-solute boundary is extracted as a sharp 
surface at a given isovalue and subsequently used in the solvation analysis, including nonpolar and polar modeling. 

One major advantage of our DG based solvation model is that it enables the synergistic coupling between the solute and 
solvent domains via the variation procedure. As a result, our DG based solvation model is able to significantly reduce the 
number of free parameters that users must “fit” or adjust in applications to real-world systems. 65 It has been demonstrated 
that physical parameters, i.e., pressure and surface tension obtained from experimental data, can be directly employed in 
our DG-based solvation models for accurate solvation energy prediction. 21 Another advantage of our DG based solvation 
model is that it avoids the use of ad hoc surface definitions and its interfaces, particularly ones generated from the Eulerian 
formulation, 13 are free of troublesome geometric singularities that commonly occur in conventional solvent-accessible and 
solvent-excluded surfaces. 19,60 As a result, our DG based solvation model bypasses the sophisticated interface techniques 
required for solving the PB equation. 32,76,77 In particular, the smooth solvent-solute interface obtained from the Eulerian 
formulation 13 can be directly interpreted as the physical solvent-solute boundary profile. Additionally, the resulting smooth 
dielectric boundary can also have a straightforward physical interpretation. The other advantage of our DG based solvation 
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model is that it is nature and easy to incorporate the density functional theory (DFT) in its variational formulation. Conse¬ 
quently, it is able to reevaluate and reassign the solute charge induced by solvent polarization effect during the solvation 
process . 15 The resulting total energy minimization process recreates or resembles the solvent-solute interactions, i.e., polar¬ 
ization, dispersion, and polar and nonpolar coupling in a realistic solvation process. Recently, DG based solvation model has 
been extended to DG based multiscale models for non-equilibrium processes in biomolecular systems . 10 ’ 11 ’ 70 ’ 72,74 These 
models recover the DG based solvation model at the equilibrium . 74 

Recently, we have demonstrated 16 that the DG based nonpolar solvation model is able to outperform many other meth¬ 
ods 30 ' 57,69 in solvation energy predictions for a large number nonpolar molecules. The root mean square error (RMSE) of 
our predictions was below 0.4kcal/mol, which clearly indicates the potential power of the DG based solvation formulation. 
However, the DG based full solvation model has not shown a similar superiority in accuracy, although it works very well . 13,14 
Having so many aforementioned advantages, our DG based solvation models ought to outperform other methods with a 
similar level of approximations. One obstacle that hinders the performance of our DG based full solvation model is the 
numerical instability in solving two strongly coupled and highly nonlinear PDEs, namely, the generalized Laplace-Beltrami 
(GLB) equation and the generalized PB (GPB) equation. To avoid such instability, a strong parameter constraint was applied 
to the nonpolar part in our earlier work , 13,14 which results in the reduction of our model accuracy. 

The objective of the present work is to explore a better parameter optimization of our DG based solvation models. A pair 
of conditions is prescribed to ensure the physical solution of the GLB equation, which leads to the well-posedness of the 
GPB equation. Such a well-posedness in turn renders the stability of solving the GLB equation. The stable solution of the 
coupled GLB and GPB equation enables us to optimize the model parameters and produce the highly accurate prediction of 
solvation free energies. Some of the best results are obtained in the solvation free energy prediction of more than a hundred 
molecules of both polar and nonpolar types. 

The rest of this paper is organized as the follows. To establish the notation and facilitate further development, we present a 
brief review of our DG based solvation models in Section II. By using the variational principle, we derive the coupled GLB and 
GPB equations. Necessary boundary conditions and initial values are prescribed to make this coupled system well-posed. 
Section III is devoted to parameter learning algorithms. We develop a protocol to stabilize the iterative solution process of 
coupled nonlinear PDEs. We introduce perturbation and convex optimization methods to ensure stability of the numerical 
solution of the GLB equation in coupling with the GPB equation. The newly achieved stability in solving the coupled PDEs 
leads to an appropriate minimization of solvation free energies with respect to our model parameters. In Section IV, we 
show that for more than a hundred of compounds of various types, including both polar and nonpolar molecules, the present 
DG solvation model offers some of the most accurate solvation free energy prediction with the overall RMSE of 0.5kcal/mol. 
This paper ends with a conclusion. 

II The DG based solvation model 

The free energy functional for our DG based full solvation model can be expressed as 13,14,71 


G[S, $] = y {7|V5|+p5+(l-5)t/ + 5[-^|V $| 2 + $p m 

r I'] (1) 

+(1-5) > dr, r e R 3 

a; J ) 

where 7 is the surface tension, p is the hydrodynamic pressure difference between solvent and solute, and U denotes 
the solvent-solute non-electrostatic interactions represented by the Lennard-Jones potentials in the present work. Here 
0 < 5 < 1 is a hypersurface or simply surface function that characterizes the solute domain and embeds the 2D surface in 
R 3 , whereas 1-5 characterizes the solvent domain . 13 Additionally, $ is the electrostatic potential and e s and e m are the 
dielectric constants of the solvent and solute, respectively. Here k B is the Boltzmann constant, T is the temperature, p a0 
denotes the reference bulk concentration of the ath solvent species, and q a denotes the charge valence of the ath solvent 
species, which is zero for an uncharged solvent component. We use p m to represent the charge density of the solute. The 
charge density is often modeled by a point charge approximation 

N m 

p m = J2QjS(r-r J ), 
j 

where Q, denoting the partial charge of the jth atom in the solute. Alternatively, the charge density computed from the DFT, 
which changes during the iteration or energy minimization, can be directly employed as well . 15 

In Eq. (1), the first three terms consist of the so called nonpolar solvation free energy functional while the last two terms 
form the polar one. After the variation with respect to 5, we construct the following generalized Laplace-Beltrami (GLB) 
equation by using a procedure discussed in our earlier work 3 


<95 

~dt 
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where the potential driven term is given by 

V = -p + U + ^|V $| 2 -*p m - ||V $| 2 - ksT^PM (e - ^ - l) . 

a. 

As in the nonpolar case, solving the generalized Laplace-Beltrami equation (2) generates the solvent-solute interface through 
the surface function S. 

Additionally, variation with respect to $ gives rise to the generalized Poisson-Boltzmann (GPB) equation: 

- V • (e(S)V$) = Sp m + (1 -S)^2 q aPa oe-*&, (3) 


where e(S) = (1 - S)e s + Se m is the generalized permittivity function. As shown in our earlier work, 13 ’ 71 e(S) is a smooth 
dielectric function gradually varying from e m to e s . Thus, the solution procedure of the GPB equation avoids many numerical 
difficulties of solving elliptic equations with discontinuous coefficients 77 ’ 78 ’ 82 ’ 85 ’ 86 in the standard PB equation. 

The GLB (2) and GBP (3) equations form a highly nonlinear system, in which the GLB equation is solved for the interface 
profile S of the solute and solvent. The interface profile determines the dielectric function e(S) in the GPB equation. The 
GPB equation is solved for the electrostatics potential $ that behaves as an external potential in the GLB equation. The 
strongly coupled system should be solved in self-consistent iterations. 

For GLB equation (2), the computational domain is L2/f2^ w , where is the solute van der Waals domain given by 
= Ui B(rJ dW ). Here B(rJ dW ) is the ith ball in the solute centered at r, with van der Waals radius rj dw . We apply the 
following Dirichlet boundary condition to S(r,t) 


S{r,t) 


0, Vr G dfl 
i, Vr g dn v ™ 


(4) 


The initial value of S(r,t) is given by 


S( r,0) 


1, Vr G , 
0, otherwise, 


(5) 


where dfi^* 1S the boundary of the extended solute domain constructed by f l™* = |J. B(r7 dW +r probe ). Here B(rJ dW +r piobe ) 
has an extended radius of rj dw + r probe with r probe being the probe radius, which is set to 1.4A in the present work. 

For GPB equation (3), the computational domain is Q. We set the Dirichlet boundary condition via the Debye-Huckel 
expression, 


$(r) = 


Nm 

E 

i =1 


Qi 


e* r — r, ; 


0 —K,\r—ri | 


Vr G <9f 1 , 


( 6 ) 


where R is the modified Debye-Huckel screening function, 14 which is zero if there is no salt molecule in the solvent. Note 
that no interface condition 76 is needed as S and e(S) are smooth functions in general for t > 0. Consequently, the resulting 
GBP (3) equation is easy to solve. 

To compare with experimental solvation data, one needs to compute the total solvation free energy, which, in our DG 
based solvation model, is obtained as 

AG = AG p + G np , (7) 

where AG P is the electrostatic solvation free energy, 


AGP = ^E^[ $ ( r *)-^( r *)] < 8 ) 

where is the solution of the above the GPB model in a homogenous system, obtained by setting a constant permittivity 
function e(r) = e m in the whole domain LL The nonpolar energy G NP is computed by 


G np = J [ 7 |VS| + pS + (1 - S)U] dr. 


(9) 


The DG based solvation model is formulated as a coupled GLB and GPB equation system, in which the GLB equation 
provides the solvent solute boundary for solving the GPB, while the GPB equation produces the external potential in the 
GLB equation for the surface evolution. The solution procedure for this coupled system has been discussed in our earlier 
work. 13 ’ 14 Essentially, for the GLB equation, an alternating direction implicit (ADI) scheme is utilized for the time integral, in 
conjugation with the second order finite difference method for the spatial discretization. The GPB equation is discretized by 
a standard second order finite difference scheme and the resulting algebraic equation system is solved by using a standard 
Krylov subspace method based solver. 13 ’ 14 
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III Parametrization methods and algorithms 

To solve the above coupled equation system, a set of parameters that appeared in the GLB equation, namely, surface 
tension 7, hydrodynamic pressure difference p and the product of solvent density p a £j = i ja , should be predetermined. 
Unfortunately, this coupled system is unstable at the certain choices of parameters. Specifically, for certain V, one may 
have S > 1 or S < 0, which leads to unphysical e(S) and unphysical solution of GPB equation (3) and thus gives rise to a 
divergent S. This instability can seriously reduce the model accuracy. 13 ’ 14 

For a concise description of our algorithm, we assume that there is only one solvent component (water) and denote the 
parameter set as: 

P = ,£n t } (10) 

where N T is the number of types of atoms in the solute molecule. 

As mentioned in the previous part, the parameter set P used in solving the coupled PDEs should meet two requirements, 
namely, the stability of solving the coupled PDEs and the optimal prediction of the solvation free energy (or fitting the 
experimental solvation free energy in the best approach). Based on these two criteria we introduce a two-stage numerical 
procedure to optimize the parameter set and solve the coupled PDEs: 


• Explore the stability conditions of the coupled PDEs by introducing an auxiliary system via a small perturbation; 

• Optimize the parameter set by an iteratively scheme satisfying the stability constraint. 


III.A Stability conditions 

In this part we investigate the stability conditions for the numerical solution to the coupled PDEs (2) and (3). The basic 
idea is to utilize a small perturbation method. It is known that omitting the external potential in the GLB equation yields the 
Laplace-Beltrami (LB) equation: 


dS 

~dt 


|V 5 |V- 7 


VS \ 


(ii) 


This equation is of diffusion type and is well posed with the Dirichlet type of boundary conditions provided 7 > 0. Numerically 
it is easy to solve Eq. (11) to yield the profile of the solvent solute boundary. 

After solving the LB equation (11), we use the generated smooth profile of the solvent solute boundary to determine the 
permittivity function in the GPB equation. For simplicity, we consider a pure water solvent, 


—V • (e(S)V$) = Sp m . (12) 

Without the external potential the system of Eqs. (11 )-(12) can be solved stably by first solving the LB equation and then 
the GPB equation. 

Motivated by the above observation, if the external potential is dominated by the mean curvature term, the stability of 
coupled GPB and GLB equations can be preserved. Based on numerical experiments, the Lennard-Jones interaction 
between the solvent and solute is usually small since this term is constrained by the nonpolar free energy in our model. In 
our method, we enforce the following constraint conditions to make the coupled system well-posed in the numerical sense 


7 > 70 > 0, 


(13) 


and 

\p\ < M, 


(14) 


where j 0 and (3 are some appropriate positive constants. 

In summary, the original problem is transformed into optimizing parameters in the following system to attain the best 
solvation free energy fitting with experimental results: 


( 7 |VS|) 


f = |VS| 

-V • (e(S)V$) = S Ptl 

7 > 70 > 0, 

\p\ < M- 


— p + U + 


JV 4 >| 2 -|e 6 


|V<ff 


(15) 


Note that the potential p m <t> is omitted in the GLB equation (15), because we have already enforced the Dirichlet boundary 
condition in the GLB equation, while p m is inside the van der Waals surface. 

Remark 1 . Based on large amount of numerical tests, it is found that there is no need to enforce the constraint conditions on 
the parameters that appear in the Lennard-Jones term. When this term is used to fit the solvation energy with experimental 
results, the parameters can be bounded in a small neighborhood of 0 automatically during the fitting procedure. These 
parameters essentially do not affect the numerical stability. 
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III.B Self-consistent approach for solving the coupled PDEs 

In this part, we propose a self-consistent approach to solve the coupled GLB and GPB equations for a given set of parame¬ 
ters. Basically, the coupled system is solved iteratively until both the electrostatic solvation free energy A G p given in Eq. ( 8 ) 
and the surface function S are both converged. Here the surface function is said to be converged provided that the surface 
area and enclosed volume are both converged. 

We present an algorithm for solving the following coupled systems: 

-V-(e(S)V$) = £>„,, ( 16 ) 


and 

where V e is the external potential which is defined as: 

• Auxiliary system: V e = |(e m - e s )|V$| 2 , 

• Full system: v e = -p + U + \{e m - e s )|V$| 2 . 


! - '-I 


Dirichlet boundary conditions are employed for both GPB (16) and GLB (17) equations with auxiliary and full external 
potentials, giving rise to a well-posed coupled system. The smooth profile of the solvent-solute boundary enables the direct 
use of the second order central finite difference scheme to achieve the second order convergence in discretizing the GPB 
equation. The biconjugate gradient scheme is used to solve the resulting algebraic equation system. The GLB equation of 
both the auxiliary and full systems can be solved by the central finite difference discretization of the spatial domain and the 
forward Euler time integrator for the time domain discretization. 


Remark 2. For the sake of simplicity, in the current work, we employed the central finite difference scheme for spatial 
domain discretization in both GPB and GLB equations, and forward Euler integrator for the time domain discretization of 
GLB equation. For stability consideration, in the discretization of the GLB equation, the discretization step size of temporal 
and spatial domain satisfies the Courant-Friedrichs-Lewy condition. To accelerate the numerical integration, a muitigrid 
solver can be employed for GBP equation, and an alternating direction implicit scheme, 13 which is unconditionally stable, 
can be utilized for the temporal integration. However, detail discussion of these accelerated schemes is beyond the scope 
of the present work. 

A pseudo code is given in Algorithm 1 to offer a general framework for solving the coupled GLB and GPB equations in a 
self-consistent manner. The outer iteration controls the convergence of the GPB equation through measuring the change of 
electrostatic solvation free energy in two adjacent iterations, while the inner iteration controls the convergence of the GLB 
equation based on the variation of surface areas and enclosed volumes through the surface function S. The variables AGf, 
AGf, Areai, Area 2 , Voli, and Vol 2 denote the electrostatic solvation free energy, surface area, and volume enclosed by the 
surface of two immediate iterations, respectively. The parameters ei, e 2 and e 3 are the threshold constants and all set to 0.01 

Algorithm 1 Self-consistent algorithm for the coupled GPB and GLB system 
i: procedure GPB-GLB-Solver 

2: Initialize: AGf = 0, AG^ = 100, Areai = 0, Area 2 = 100, Voli = 0, Vol 2 = 100 

3: do while (|AG^ - AG^| < ei) 

4: AGi <- AG| 

5: do while (I Areai - Area 2 | < e 2 .and. |Voli - Vol 2 | < e 3 ) 

6 : Areai Area 2 , Voli «— Vol 2 . 

7: Update the surface profile function S by solving the GLB equation (17). 

8 : Area 2 = f Q Sdr, Vol 2 = f n |V 5 |dr. 

9 : enddo 

10 : Solve the GPB equation (16) in both vacuum and solvent with the previous updated surface profile. 

11: Update the polar solvation free energy AG% according to Eq. (8). 

12 : enddo 


in the current implementation. 

Remark 3. In solving the GLB equation, during each updating, to ensure the stability instead of the fully update, we update 
it partially, i.e., the updated solution is the weighted sum of the new solution of the current GLB solution S new and the old 
solution of the GLB equation in the previous step S 0 i d : 


S — (JiiSnew + (1 ^l)*^old; 


(18) 


where ai is a constant and set to 0.5 in the present work. 
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III.C Convex optimization for parameter learning 

In this part, we present the parameter optimization scheme. In our approach, parameters start from an initial guess and 
then are updated sequentially until reaching the convergence. Here the convergence is measured by the root mean square 
(RMS) error between the fitted and experimental solvation free energies for a given set of molecules. 

Consider the parameter optimization for a given group of molecules, denoted as {T 1 ,T 2 ,--- ,T n }. As discussed above 
the parameter set is P. To optimize the parameter set P, we start from GPB equation (16) and the auxiliary system of GLB 
equation (17) with 7 = 0.05. After solving the initial coupled system by using Algorithm 1, we obtain the following quantities 
for each molecule in the training set: 



where j = 1,2,- •• ,n. Here N m and N T denote the number of atoms and types of atoms in a specific molecule. The last 
few terms involve semi-discrete and semi-continuum Lennard-Jones potentials. 13 Additionally, 

£ _ ( 1) if a f° m i belongs to type j, 
i ~ \ 0, otherwise. 

where 7 = 1,2,--- , N m \ j = 1,2, • • • iV T ; a*, i = 1,2, • • • , N T is the atomic radius of the 7th type of atoms. Therefore, atoms 
of the same type have a common atomic radius and fitting parameter i. 

The predicted solvation free energy for molecule j can be represented as: 



We denote the predicted solvation free energy for the given set of molecules as AG(P) = {AGi,AG 2 ,-- • , AG„}, which 
is a function of the parameter set P, and denote the corresponding experimental solvation free energy as AG Exp = 

{AG Expl , AG Exp2 ,-• • ,AG Exp ”}. 

Then the parameter optimization problem in the coupled PDEs given by Eqs. (15) can be transformed into the following 
regularized and constrained optimization problem: 

mm (||AG(P) - AG Exp || 2 + A||P|| 2 ) , (24) 

s.t. 

7 > 7o, (25) 

and 

\p\ < /?7> (26) 

where || * || 2 is the L 2 norm of the quantity * and A is the regularization parameter chosen to be 10 in the present work to 
ensure the dominance of the first term and avoid overfitting. Here 70 and 3 are set respectively to 0.05 and 0.1 in the present 
implementation, which guarantees the stability of the coupled system according to a large amount of numerical tests. 

It is obvious that the objective function (24) in the optimization is a convex function, meanwhile the solution domain 
restricted by constraints (25)-(26) forms a convex domain. Therefore the optimization problem given by Eqs. (24)-(26) is a 
convex optimization problem, which was studied by Grant and Boyd. 37,38 

After solving the above convex optimization problem, parameter set P is updated and used again in solving the coupled 
GLB and GPB system, i.e., Eqs. (17) and (16). Repeating the above procedure, a new group of predicted solvation free 
energies together with a new group of parameters is obtained. This procedure is repeated until the RMS error between the 
predicted and experimental solvation free energies in two sequential iterations is within a given threshold. 



8 



Solvent radius (A) 


Figure 1: The relation between the solvent radii and the RMS error of the SAMPLO test set. The local minimum appears at the solvent radii 3.0 A, with 
RMS error being 0.6 kcal/mol for a set of 17 molecules. 

III.D Algorithm for parameter optimization and solution of the coupled PDEs 

Based on the preparation made in the previous two subsections, namely, the self-consistent approach for solving the coupled 
GLB and GPB system and the parameter optimization, we provide the combined algorithm for the parameter optimization 
and solving the coupled system for a given set of molecules. 

Algorithm 2 offers a parameter learning pseudo code for a given group of molecules. This algorithm is formulated by 
combining outer and inner self-consistent iterations. The outer iteration controls the convergence of the optimized param¬ 
eters via two controlling parameters, Err! and Err 2 , denoting the RMS error between predicted and experimental solvation 
free energies in two sequential iterations. The inner iteration implements the solution to the GLB and GPB equations by 
Algorithm 1. 

Algorithm 2 Parameters learning for a given group of molecules 

i: procedure Parameters-Learning 

2 : Initialize: Erri = 0, Err 2 = 100 

3: Solve the coupled GPB and GLB system, where GLB utilizes the auxiliary equation (17). 

4: Solve the constrained optimization problem Eqs. (24)-(26) to obtain the initial parameter set P 0 . 

5: Update Eri'! to be the RMS error between experimental and predict results in the above step. 

6: do While (|Erri - Err 2 | < e 4 ) 

7 : Err 2 Erri. 

8: Solve the coupled GPB and GLB system, where GLB system with parameters set P 0 . 

9: Solve the constrained optimization problem Eqs. (24)-(26) to get the updated parameters set P. 

10 : Update Err! to be RMS error between experimental and predict results in the previous optimization step. 

ii: Update P 0 <- P. 

12 : enddo 

The threshold parameter e 4 is set to 0.01 in the present work. 

IV Numerical results 

In this section we present the numerical study of the DG based solvation model using the proposed parameter optimization 
algorithms. We first explore the optimal solvent radius used in the van der Waals interactions. Due to the high nonlinearity, 
the solvent radius cannot be automatically optimized and its optimal value is obtained via searching the parameter domain. 
We show that for a group of molecules, there is a local minimum in the RMS error when the solvent radius is varied. The 
corresponding optimal solvent radius is adopted for other molecules. Additionally, we consider a large number of molecules 
with known experimental solvation free energies to test the proposed parameter optimization algorithms. These molecules 
are of both polar and nonpolar types and are divided into six groups: the SAMPLO test set, 53 the alkane, alkene, ether, 
alcohol and phenol types. 50 It is found that our DG based solvation model works really well for these molecules. Finally, 
to demonstrate the predictive power of the present DG based solvation model, we perform a five-fold cross validation 39 for 
alkane, alkene, ether, alcohol and phenol types of molecules. It is found that training and validation errors are of the same 
level, which confirms the ability of our model for the solvation free energy prediction. 

The SAMPLO molecule structural conformations are adopted from the literature with ZAP 9 radii and the OpenEye-AMI- 
BCC vl charges. 53 For other molecules, structural conformations are obtained from FreeSolv. 50 Amber GAFF force field is 
utilized for the charge assignment. 9 The van der Waals radii as well as the atomic radii of Hydrogen, Carbon and Oxygen 
atoms are set to 1.2, 1.7 and 1.5A, respectively. The grid spacing is set to 0.25A in all of our calculations (discretization and 
integration). The computational domain is set to the bounding box of the solute molecule with an extra buffer length of 6.0 
A. 
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Exptl solvation energy (kcal/mol) 

Figure 2: The predicted and experimental solvation free energy for the 17 molecules in the SAMPLO test set. 


Table 1: The solvation free energy prediction for the SAMPLO set. Energy is in the unit of kcai/mol. 


Name 

A 

qNP 

AG 


Error 

Glycerol triacetate 

-10.60 

2.53 

-8.07 

-8.84 

-0.77 

Benzyl bromide 

-4.31 

1.93 

-2.38 

-2.38 

0.00 

Benzyl chloride 

-4.45 

1.18 

-3.27 

-1.93 

1.34 

m-Bis (trifluoromethyl) benzene 

-2.62 

3.70 

1.08 

1.07 

-0.01 

N,N-Dimethyl-p-methoxybenzamide 

-8.35 

-2.22 

-10.57 

-11.01 

-0.45 

N,N-4-Trimethylbenzamide 

-6.93 

-3.09 

-10.03 

-9.76 

0.27 

bis-2-Chloroethyl ether 

-3.73 

-0.14 

-3.59 

-4.23 

-0.64 

1,1-Diacetoxyethane 

-7.07 

2.00 

-5.07 

-4.97 

0.10 

1,1-Diethoxyethane 

-3.58 

0.43 

-3.15 

-3.28 

-0.13 

1,4-Dioxane 

-5.36 

-0.38 

-5.74 

-5.05 

0.69 

Diethyl propanedioate 

-7.07 

1.40 

-5.67 

-6.00 

-0.33 

Dimethoxymethane 

-4.09 

1.19 

-2.90 

-2.93 

-0.03 

Ethylene glycol diacetate 

-7.66 

1.90 

-5.76 

-6.34 

-0.58 

1,2-Diethoxyethane 

-3.64 

0.45 

-4.09 

-3.54 

0.55 

Diethyl sulfide 

-2.21 

0.76 

-1.47 

-1.43 

0.04 

Phenyl formate 

-7.10 

2.08 

-5.02 

-4.08 

0.94 

Imidazole 

-11.54 

2.71 

-8.83 

-9.81 

-0.98 

RMS 





0.60 


IV.A Solvent radius 

In the present method, the van der Waals radii of solute atoms are employed to define the van der Waals surface, which 
is used for setting up the boundary condition for the GLB equation. Additionally, solvent and solute atomic radii are used 
in the Lennard-Jones potentials. Atomic radii of solute are set to the van der Waals radii in the present work, whereas the 
solvent radius is considered an optimization parameter. We utilize a brute force approach for the solvent radii selection. We 
employ the SAMPLO test set 53 as a benchmark. The solvent radius is varied from 0.5 A to 5.5 A away from van der Waals 
surface. Due to the fast decay property of the Lennard-Jones interactions, the above setting enables the full inclusion of the 
Lennard-Jones interactions in our model. 

Figure 1 depicts the RMS error of the 17 molecules from the SAMPLO set at the different solvent radii calculated from the 
present DG based solvation model. The result clearly demonstrates that with the increase of the solvent radius, the RMS 
error decreases dramatically initially. The minimum appears at 3.0 A. The further increase of the solvent radius leads to a 
rapid jump in the RMS error before it stabilizes around 1.54 kcal/mol. It is noted that 3.0 A is much larger than the commonly 
used solvent radius of 1.4 A in Poisson-Boltzmann based implicit solvent models. However, unlike in the commonly used 
implicit solvent models, the Lennard-Jones potential in our DG based solvation model is of half discrete and half continuum. 
Therefore, the solvent radius is an on-grid average value in the DG based solvation model. In all the following computations, 
the solvent radius is set to 3.0 A. 

IV.B Optimization results 

In this section, we illustrate the performance of our parameter optimization algorithms. First, we provide the regression 
results of the SAMPLO test set. 53 Figure 2 shows the predicted and experimental solvation free energies based on the 
present model and optimization method. It is obvious that predicted solvation free energies are highly consistent with the 
experimental ones. The RMS error is 0.60 kcal/mol. 

Table 1 shows the breakup of polar, nonpolar and total predicted solvation free energies. The experimental values and 
errors are also provided. 53 
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Figure 3: The predicted and experimental solvation free energies for 38 alkane molecules. 
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Figure 4: The predicted and experimental solvation free energies for 22 alkene molecules. 


Compared to our earlier prediction 13 in which the same model is employed but the parameters were not optimized in 
the present manner, the RMS error decreases dramatically from previous 1.76 kcal/mol to 0.60 kcal/mol for the same test 
set. Note that the present RMS error (0.60 kcal/mol) is also significantly smaller than that of the explicit solvent approach 
(1.71 ± 0.05 kcal/mol) and that obtained by the PB based prediction (1.87 kcal/mol) under the same structure, charge and 
radius setting. 53 The present results confirm the efficiency of the proposed new parameter optimization algorithms and 
demonstrate the accuracy and power of our DG based solvation models. 

Additionally, we investigate the solvation free energies prediction of two families of nonpolar molecules, alkane and alkene, 
which were studied previous by using our DG based nonpolar solvation model. 16 In the following, we demonstrate that the 
present DG based full solvation model can provide the same level of accuracy in the solvation free energy prediction for 
alkane and alkene molecules. 

Figures 3 and 4 depict the predicted and experimental solvation free energies for 38 alkane and 22 alkene molecules, 
respectively. Tables 2 and 3 list the polar, nonpolar, total and experimental solvation free energies for both families of solute 
molecules, respectively. Except for one alkane molecule, namely, cycloprotane, whose predicted error is 1.60 kcal/mol, 
the errors for all other molecules are within 1 kcal/mol. The RMS errors of these two families are 0.36 and 0.46 kcal/mol, 
respectively. This level of accuracy is similar to our earlier results obtained by using our DG based nonpolar solvation 
model, 16 which does not involve the electrostatic (polar) model and is computationally easier to optimize. 

It is interesting to note that for both alkane and alkene molecules, the polar solvation free energy contribution is very small 
and the nonpolar part dominates the solvation free energy contribution, which explains why the DG based nonpolar solvation 
model works extremely well for the solvation free energy prediction of alkane and alkene molecules. 16 Further, note that 
for almost all the alkane molecules, the polar solvation free energies AG p are of magnitude 0.01 kcal/mol, while alkene 
molecules have slightly larger magnitude polar free energies, which further verifies that alkene molecules has a stronger 
polarity than alkane molecules in general. 

Finally, we analyze three classes of polar solute molecules, namely, ether, alcohol, and phenol molecules. Figures 5, 6 
and 7 illustrate the predicted and experimental solvation free energies for 17 ether, 25 alcohol, and 18 phenol molecules, 
respectively. Tables 4, 5 and 6 list the polar, nonpolar, total and experimental solvation free energies for the corresponding 
families of solute molecules. The RMS errors of these three families are 0.36, 0.33, and 0.76 kcal/mol, respectively. 

From the results listed in Tables 4, 5 and 6 we note that for ether molecules, all the nonpolar energies are positive which 
neutralizes some polar contributions to the total solvation free energies. For the alcohol molecules, the nonpolar energies 
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Table 2: The solvation free energy prediction for the alkane set. All energies are in the unit of kcal/mol. 


Name 

A G F 


AG AG KxpSU 

Error 

octane 

-0.13 

2.89 

2.76 

2.88 

0.12 

ethane 

-0.04 

1.70 

1.66 

1.83 

0.17 

propane 

-0.05 

1.83 

1.78 

2.00 

0.22 

cyclopropane 

-0.08 

2.43 

2.35 

0.75 

-1.60 

isobutane 

-0.07 

2.09 

2.02 

2.30 

0.28 

2,2-dimethylbutane 

-0.07 

2.34 

2.27 

2.51 

0.24 

isopentane 

-0.07 

2.19 

2.12 

2.38 

0.26 

2,3-dimethylbutane 

-0.07 

2.41 

2.34 

2.34 

0.00 

3-methylpentane 

-0.08 

2.43 

2.35 

2.51 

0.16 

methylcyclopentane 

-0.10 

1.76 

1.66 

1.59 

-0.07 

n-butane 

-0.07 

2.03 

1.96 

2.10 

0.14 

isohexane 

-0.09 

2.49 

2.40 

2.51 

0.11 

2,4-dimethylpentane 

-0.09 

2.57 

2.48 

2.83 

0.35 

methylcyclohexane 

-0.10 

1.68 

1.58 

1.70 

0.12 

n-pentane 

-0.08 

2.25 

2.17 

2.30 

0.13 

hexane 

-0.09 

2.51 

2.42 

2.48 

0.06 

cyclohexane 

-0.10 

1.40 

1.30 

1.23 

-0.07 

nonane 

-0.14 

3.11 

2.97 

3.13 

0.16 

heptane 

-0.11 

2.73 

2.62 

2.67 

0.05 

cyclopentane 

-0.10 

1.54 

1.44 

1.20 

-0.24 

cycloheptane 

-0.11 

1.56 

1.45 

0.80 

-0.65 

cyclooctane 

-0.12 

1.69 

1.57 

0.86 

-0.71 

neopentane 

-0.06 

2.13 

2.07 

2.51 

0.44 

2,2,4-trimethylpentane 

-0.08 

2.74 

2.66 

2.89 

0.23 

3,3-dimethylpentane 

-0.07 

2.58 

2.51 

2.56 

0.05 

2,3-dimethylpentane 

-0.08 

2.72 

2.64 

2.52 

-0.12 

2,3,4-trimethylpentane 

-0.08 

2.96 

2.88 

2.56 

-0.32 

1,2-dimethylcyclohexane 

-0.10 

2.02 

1.92 

1.58 

-0.34 

3-methylhexane 

-0.09 

2.74 

2.65 

2.71 

0.06 

3-methylheptane 

-0.11 

2.94 

2.83 

2.97 

0.14 

1,4-dimethylcyclohexane 

-0.11 

2.02 

1.91 

2.11 

0.20 

2,2-dimethylpentane 

-0.08 

2.64 

2.56 

2.88 

0.32 

2-methylhexane 

-0.10 

2.73 

2.63 

2.93 

0.30 

decane 

-0.16 

3.37 

3.21 

3.16 

-0.06 

propylcyclopentane 

-0.12 

2.21 

2.09 

2.13 

0.03 

cis-1,2-Dimethylcyclohexane 

-0.09 

1.95 

1.86 

1.58 

-0.28 

2,2,5-trimethylhexane 

-0.09 

3.15 

3.06 

2.93 

-0.13 

pentylcyclopentane 

-0.15 

2.73 

2.58 

2.55 

-0.04 

RMS 





0.36 



Exptl solvation energy (kcal/mol) 

Figure 5: The predicted and experimental solvation free energy for the 17 ether molecules. 

are all negative, which enhance the contributions of the polar contributions to the total solvation free energies. Since the 
surface part is always positive and the volume part is mostly positive, the attractive van der Waals interactions between 
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Table 3: The solvation free energy prediction for the alkene set. All energies are in the unit of kcal/mol. 


Name 

A G p 

G NP 

A G 


Error 

ethylene 

-0.27 

0.96 

0.69 

1.28 

0.59 

isoprene 

-0.62 

1.97 

1.35 

0.68 

-0.67 

but-1-ene 

-0.29 

1.17 

0.88 

1.38 

0.50 

butadiene 

-0.56 

1.75 

1.19 

0.56 

-0.63 

pent-1-ene 

-0.30 

1.57 

1.27 

1.68 

0.41 

prop-1-ene 

-0.32 

1.03 

0.71 

1.32 

0.61 

2-methylprop-1-ene 

-0.37 

1.26 

0.89 

1.16 

0.27 

cyclopentene 

-0.37 

1.17 

0.79 

0.56 

-0.23 

2-methylbut-2-ene 

-0.40 

1.28 

0.87 

1.31 

0.44 

2,3-dimethylbuta-1,3-diene 

-0.65 

2.01 

1.36 

0.40 

-0.95 

3-methylbut-1-ene 

-0.27 

1.45 

1.18 

1.83 

0.65 

1 -methylcyclohexene 

-0.38 

1.50 

1.11 

0.67 

-0.45 

penta-1,4-diene 

-0.53 

1.91 

1.38 

0.93 

-0.45 

hex-1-ene 

-0.30 

1.81 

1.50 

1.58 

0.08 

hexa-1,5-diene 

-0.51 

1.88 

1.37 

1.01 

-0.36 

hept-1-ene 

-0.33 

2.17 

1.84 

1.66 

-0.18 

hept-2-ene 

-0.34 

1.96 

1.62 

1.68 

0.06 

4-Methyl-1-pentene 

-0.26 

1.71 

1.45 

1.91 

0.46 

2-methylpent-1-ene 

-0.33 

1.75 

1.42 

1.47 

0.05 

non-1-ene 

-0.36 

2.81 

2.45 

2.06 

-0.39 

trans-2-Heptene 

-0.34 

1.90 

1.56 

1.66 

0.10 

trans-2-Pentene 

-0.30 

1.26 

0.96 

1.34 

0.38 

RMS 





0.46 



Exptl solvation energy (kcal/mol) 

Figure 6: The predicted and experimental solvation free energy for the 25 alcohol molecules. 



Exptl solvation energy (kcal/mol) 

Figure 7: The predicted and experimental solvation free energy for the 18 phenol molecules. 

alcohol molecules and water solvent must be very strong, which explains that alcohol molecules are easily solvated. As for 
the phenol molecules, there is a mixed pattern for the nonpolar contributions. 
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Table 4: The solvation free energy prediction for the ether set. All energies are in the unit of kcal/mol. 


Name 

A G F 

—qNF~ 

AG AG Kxpau 

Error 

ethoxyethane 

-4.08 

2.33 

-1.75 

-1.59 

0.16 

2-methyltetrahydrofuran 

-4.10 

1.43 

-2.67 

-3.30 

-0.63 

tetrahydrofuran 

-4.36 

1.36 

-3.00 

-3.47 

-0.47 

1-propoxypropane 

-3.75 

2.29 

-1.46 

-1.16 

0.30 

methoxymethane 

-4.55 

2.26 

-2.29 

-1.91 

0.36 

tetrahydropyran 

-4.17 

1.09 

-3.07 

-3.12 

-0.05 

1-butoxybutane 

-3.88 

2.33 

-1.55 

-0.83 

0.72 

trimethoxymethane 

-7.57 

3.51 

-4.06 

-4.42 

-0.36 

methoxyethane 

-4.35 

2.29 

-2.06 

-2.10 

-0.04 

1-methoxypropane 

-4.08 

2.24 

-1.84 

-1.66 

0.18 

2-methoxypropane 

-4.12 

2.20 

-1.92 

-2.01 

-0.09 

1-Ethoxypropane 

-4.26 

2.32 

-1.94 

-1.81 

0.13 

1,3-Dioxolane 

-6.09 

1.81 

-4.28 

-4.10 

0.18 

2,5-dimethyltetrahydrofuran 

-3.86 

1.42 

-2.44 

-2.92 

-0.48 

1,1,1 -trimethoxyethane 

-7.58 

3.46 

-4.12 

-4.42 

-0.30 

2-methoxy-2-methyl-propane 

-3.88 

1.97 

-1.91 

-2.21 

-0.30 

1,4-dioxane 

-7.09 

1.66 

-5.44 

-5.06 

0.38 

RMS 





0.36 


Table 5: The solvation free energy prediction for the alcohol set. All energies are in the unit of kcal/mol. 


Name 

AG 13- 

G Nh> 

AG A G Bxp3U 

Error 

ethylene glyco 

-6.98 

-1.76 

-8.73 

-9.30 

-0.57 

butan-1-o 

-3.33 

-1.51 

-4.84 

-4.72 

0.12 

ethano 

-3.49 

-1.47 

-4.96 

-5.00 

-0.04 

methano 

-3.69 

-1.41 

-5.10 

-5.10 

0.00 

propan-1-o 

-3.34 

-1.48 

-4.82 

-4.85 

-0.03 

propan-2-o 

-3.26 

-1.36 

-4.62 

-4.74 

-0.12 

pentan-1-o 

-3.36 

-1.61 

-4.97 

-4.57 

0.40 

2-methylpropan-2-o 

-3.10 

-1.27 

-4.37 

-4.47 

-0.10 

2-methylbutan-2-o 

-2.95 

-1.17 

-4.12 

-4.43 

-0.31 

2-methylpropan-1-o 

-3.20 

-1.50 

-4.70 

-4.50 

0.20 

butan-2-o 

-3.09 

-1.32 

-4.40 

-4.62 

-0.22 

cyclopentano 

-3.20 

-1.68 

-4.88 

-5.49 

-0.61 

4-methylpentan-2-o 

-2.65 

-1.05 

-3.69 

-3.73 

-0.04 

cyclohexano 

-3.21 

-1.92 

-5.13 

-5.46 

-0.33 

hexan-1-o 

-3.43 

-1.53 

-4.96 

-4.40 

0.56 

heptan-1-o 

-3.48 

-1.62 

-5.09 

-4.21 

0.88 

2-methylbutan-1-o 

-3.27 

-1.29 

-4.56 

-4.42 

0.14 

cycloheptano 

-3.07 

-1.89 

-4.96 

-5.48 

-0.52 

2-methylpentan-3-o 

-2.86 

-0.93 

-3.78 

-3.88 

-0.10 

pentan-3-o 

-3.01 

-1.08 

-4.10 

-4.35 

-0.25 

4-Heptano 

-2.90 

-1.10 

-3.99 

-4.01 

-0.02 

2-methylpentan-2-o 

-2.93 

-1.08 

-4.00 

-3.92 

0.08 

2,3-Dimethyl-2-butano 

-2.89 

-0.93 

-3.82 

-3.91 

-0.09 

hexan-3-o 

-3.04 

-1.27 

-4.31 

-4.06 

0.25 

pentan-2-o 

-3.10 

-1.23 

-4.33 

-4.39 

-0.06 

RMS 




0.33 


The above study of a large variety of molecules indicates that our DG based solvation model together with the pro¬ 
posed parameter optimization algorithms can provide very accurate predictions of solvation free energies for both polar and 
nonpolar solute molecules. 

IV.C Five-fold cross validation 

Having verified that our DG based solvation model with the optimized parameters provides very good regression results, we 
perform a five-fold cross validation to further illustrate the predictive power of the present method for independent data sets. 
Specifically, the parameters learned from a group of molecules can be employed for the blind prediction of other molecules. 

To perform the five-fold cross validation, each type of molecules is subdivided into five sub-groups as uniformly as pos¬ 
sible, Table 7 lists the number of molecules in each sub-group for each type of molecules. In our parameters optimization, 
we leave out one sub-group of molecules and use the rest of molecules to establish our DG based solvation model. The 
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Table 6: The solvation free energy prediction for the phenol set. All energies are in the unit of kcal/mol. 


Name 


A 

£»1NP 

AG 

AgUxpbU 

Error 

3-hydroxybenzaldehyde 


-9.17 

0.39 

-8.78 

-9.52 

-0.74 

4-hydroxybenzaldehyde 


-9.60 

0.19 

-9.41 

-8.83 

0.58 

o-cresol 


-5.32 

-1.04 

-6.36 

-5.90 

0.46 

m-cresol 


-5.71 

-0.86 

-6.57 

-5.49 

1.08 

phenol 


-5.81 

-0.14 

-6.95 

-6.61 

0.34 

p-cresol 


-5.80 

-1.05 

-6.85 

-6.13 

0.72 

naphthalen-1-ol 


-5.50 

-0.75 

-6.25 

-7.67 

-1.42 

3,4-dimethylphenol 


-5.72 

-0.49 

-6.21 

-6.50 

-0.29 

2,5-dimethylphenol 


-5.34 

-0.48 

-5.82 

-5.91 

-0.09 

4-tert-butylphenol 


-5.55 

0.86 

-4.69 

-5.91 

-1.22 

2,4-dimethylphenol 


-5.55 

-1.03 

-6.58 

-6.01 

0.57 

3,5-dimethylphenol 


-5.69 

-0.41 

-6.10 

-6.27 

-0.17 

naphthalen-2-ol 


-5.85 

-0.72 

-6.57 

-8.11 

-1.54 

2,3-dimethylphenol 


-5.47 

-1.13 

-6.60 

-6.16 

0.44 

2,6-dimethylphenol 


-5.07 

-1.07 

-6.14 

-5.26 

0.88 

3-ethylphenol 


-5.67 

-0.37 

-6.04 

-6.25 

-0.21 

4-propylphenol 


-5.79 

-0.05 

-5.84 

-5.21 

0.63 

4-ethylphenol 


-5.76 

-0.48 

-6.24 

-6.13 

0.11 

RMS 






0.76 

Table 7: The partition of the molecules into sub-groups. 

Molecule Group 1 

Group 2 

Group 3 Group 4 Group 5 

Alkane 8 

8 


8 

7 

7 


Alkene 5 

5 


5 

4 

4 


Ether 4 

4 


3 

3 

3 


Alcohol 5 

5 


5 

5 

5 


Phenol 4 

4 


4 

3 

3 




Group 

Figure 8: The bar plot of the training and validation errors of alkanes. 

optimized parameters are then employed for the blind prediction of solvation free energies of the left out sub-group of 
molecules. 

Figures 8, 9,10, 11, and 12 demonstrate the cross validation results of the alkane, alkene, ether, alcohol, and phenol 
molecules, respectively. It is seen that training and validation errors are similar to each other, which verifies the ability of our 
model in the blind prediction of solvation free energies. 

In the real prediction of the solvation free energy for a given molecule of unknown category, we can first assign it to a 
given group, and then employ the DG based solvation model with the optimal parameters learned for this specific group for 
a blind prediction. 

V Conclusion 

Differential geometry (DG) based solvation models have had a considerable success in solvation analysis. 13-15 ’ 70 Particu¬ 
larly, our DG based nonpolar solvation model was shown to offer some of the most accurate solvation energy predictions 
of various nonpolar molecules. 16 However, our DG based full solvation model is subject to numerical instability in solving 
the generalized Laplace-Beltrami (GLB) equation, due to its coupling with the generalized Poisson Boltzmann (GPB) equa¬ 
tion. To stabilize the coupled GLB and GPB equations, a strong constraint on the van der Waals interaction was applied 
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Group 

Figure 9: The bar plot of the training and validation errors of alkenes. 
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Figure 10: The bar plot of the training and validation errors of the ethers. 



Group 

Figure 11: The bar plot of the training and validation errors of alcohols. 

in our earlier work, 13-15 which hinders the parameter optimization of our DG based solvation model. In the present work, 
we resolve this problem by introducing new parameter optimization algorithms, namely perturbation method and convex 
optimization, for the DG based solvation model. New stability conditions are explicitly imposed to the parameter selection, 
which guarantees the stability and robustness of solving the GLB equation and leads to constrained optimization of our DG 
based solvation model. The new optimization algorithms are intensively validated by using a large number of test molecules, 
including the SAMPLO test set, 53 alkane, alkene, ether, alcohol and phenol types of solutes. Regression results based on 
our new algorithms are consistent extremely well with experimental data. Additionally, a five-fold cross validation technique 
is employed to explore the ability of our DG based solvation models for the blind prediction of the solvation free energies 
for a variety of solute molecules. It is found that the same level of errors is found in the training and validation sets, which 
confirms our model’s predictive power in solvation free energy analysis. The present DG based full solvation model provides 
a unified framework for analyzing both polar and nonploar molecules. In our future work, we will develop machine learning 
approaches for the robust classification of solute molecules of interest into appropriate categories so as to better predict 
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Group 


Figure 12: The bar plot of the training and validation errors of phenols. 


their solvation free energies. 

Acknowledgments 

This work was supported in part by NSF grants IIS-1302285 and DMS-1160352, NIH Grant R01GM-090208, and MSU 

Center for Mathematical Molecular Biosciences Initiative. The authors thank Nathan Baker for valuable comments. 

References 

[1] N. A. Baker. Improving implicit solvent simulations: a Poisson-centric view. Current Opinion in Structural Biology , 15(2):137—43, 
2005. 

[2] D. Bashford and D. A. Case. Generalized Born models of macromolecular solvation effects. Annual Review of Physical Chemistry, 
51:129-152,2000. 

[3] P. W. Bates, Z. Chen, Y. H. Sun, G. W. Wei, and S. Zhao. Geometric and potential driving formation and evolution of biomolecular 
surfaces. J. Math. Biol., 59:193-231, 2009. 

[4] P. W. Bates, G. W. Wei, and S. Zhao. The minimal molecular surface. arXiv:q-bio/0610038v1, [q-bio.BM], 2006. 

[5] P. W. Bates, G. W. Wei, and S. Zhao. The minimal molecular surface. Midwest Quantitative Biology Conference , Mission Point 
Resort, Mackinac Island, MhSeptember 29 - October 1,2006. 

[6] P. W. Bates, G. W. Wei, and S. Zhao. Minimal molecular surfaces and their applications. Journal of Computational Chemistry , 
29(3):380-91,2008. 

[7] D. Beglov and B. Roux. Solvation of complex molecules in a polar liquid: an integral equation theory. Journal of Chemical Physics, 
104(21 ):8678-8689, 1996. 

[8] C. A. S. Bergstrom, M. Strafford, L. Lazorova, A. Avdeef, K. Luthman, and P. Artursson. Absorption classification of oral drugs based 
on molecular surface properties. Journal of Medicinal Chemistry, 46(4):558-570, 2003. 

[9] D. A. Case, J. T. Berryman, R. M. Betz, D. S. Cerutti, T. E. C. Ill, T. A. Darden, R. E. Duke, T. J. Giese, H. Gohlke, A. W. Goetz, 
N. Homeyer, S. Izadi, P. Janowski, J. Kaus, A. Kovalenko, T. S. Lee, S. LeGrand, P. Li, T. Luchko, R. Luo, B. Madej, K. M. Merz, 
G. Monard, P. Needham, H. Nguyen, H. T. Nguyen, I. Omelyan, A. Onufriev, D. R. Roe, A. Roitberg, R. Salomon-Ferrer, C. L. 
Simmerling, W. Smith, J. Swails, R. C. Walker, J. Wang, R. Wolf, X. Wu, D. M. York, and P. A. Kollman. Amber 2015. University of 
California, San Francisco, 2015. 

[10] D. Chen, Z. Chen, and G. W. Wei. Quantum dynamics in continuum for proton transport II: Variational solvent-solute interface. 
International Journal for Numerical Methods in Biomedical Engineering, 28:25 - 51,2012. 

[11] D. Chen and G. W. Wei. Quantum dynamics in continuum for proton transport—Generalized correlation. J Chem. Phys., 136:134109, 
2012 . 

[12] D. Chen, G. W. Wei, X. Cong, and G. Wang. Computational methods for optical molecular imaging. Communications in Numerical 
Methods in Engineering, 25:1137-1161,2009. 

[13] Z. Chen, N. A. Baker, and G. W. Wei. Differential geometry based solvation models I: Eulerian formulation. J. Comput. Phys., 
229:8231-8258,2010. 

[14] Z. Chen, N. A. Baker, and G. W. Wei. Differential geometry based solvation models II: Lagrangian formulation. J. Math. Biol., 
63:1139-1200,2011. 

[15] Z. Chen and G. W. Wei. Differential geometry based solvation models III: Quantum formulation. J. Chem. Phys., 135:194108, 2011. 

[16] Z. Chen, S. Zhao, J. Chun, D. G. Thomas, N. A. Baker, P. B. Bates, and G. W. Wei. Variational approach for nonpolar solvation 
analysis. Journal of Chemical Physics, 137(084101), 2012. 

[17] L. T. Cheng, J. Dzubiella, A. J. McCammon, and B. Li. Application of the level-set method to the implicit solvation of nonpolar 
molecules. Journal of Chemical Physics, 127(8), 2007. 

[18] M. L. Connolly. Analytical molecular surface calculation. Journal of Applied Crystallography, 16(5) :548—558, 1983. 

[19] M. L. Connolly. Depth buffer algorithms for molecular modeling. J. Mol. Graphics, 3:19-24, 1985. 































17 


[20] P. B. Crowley and A. Golovin. Cation-pi interactions in protein-protein interfaces. Proteins: Structure, Function, and Bioinformatics, 
59(2):231-239, 2005. 

[21] M. Daily, J. Chun, A. Heredia-Langner, G. W. Wei, and N. A. Baker. Origin of parameter degeneracy and molecular shape relation¬ 
ships in geometric-flow calculations of solvation free energies. Journal of Chemical Physics,, 139:204108, 2013. 

[22] L. David, R. Luo, and M. K. Gilson. Comparison of generalized Born and Poisson models: Energetics and dynamics of HIV protease. 
Journal of Computational Chemistry, 21 (4):295-309, 2000. 

[23] M. E. Davis and J. A. McCammon. Electrostatics in biomolecular structure and dynamics. Chemical Reviews, 94:509-21, 1990. 

[24] B. N. Dominy and C. L. Brooks, III. Development of a generalized Born model parameterization for proteins and nucleic acids. 
Journal of Physical Chemistry B, 103(18):3765-3773, 1999. 

[25] F. Dong, M. Vijaykumar, and H. X. Zhou. Comparison of calculation and experiment implicates significant electrostatic contributions 
to the binding stability of barnase and barstar. Biophysical Journal, 85(1):49-60, 2003. 

[26] F. Dong and H. X. Zhou. Electrostatic contribution to the binding stability of protein-protein complexes. Proteins, 65(1 ):87—102, 2006. 

[27] A. I. Dragan, C. M. Read, E. N. Makeyeva, E. I. Milgotina, M. E. Churchill, C. Crane-Robinson, and P. L. Privalov. DNA binding and 
bending by HMG boxes: Energetic determinants of specificity. Journal of Molecular Biology, 343(2) :371 -393, 2004. 

[28] J. Dzubiella, J. M. J. Swanson, and J. A. McCammon. Coupling hydrophobicity, dispersion, and electrostatics in continuum solvent 
models. Physical Review Letters, 96:087802, 2006. 

[29] F. Fogolari, A. Brigo, and H. Molinari. The Poisson-Boltzmann equation for biomolecular electrostatics: a tool for structural biology. 
Journal of Molecular Recognition, 15(6):377-92, 2002. 

[30] E. Gallicchio, M. M. Kubo, and R. M. Levy. Enthalpy-entropy and cavity decomposition of alkane hydration free energies: Numerical 
results and implications for theories of hydrophobic solvation. Journal of Physical Chemistry B, 104(26):6271-6285, 2000. 

[31] E. Gallicchio, L. Y. Zhang, and R. M. Levy. The SGB/NP hydration free energy model based on the surface generalized Born solvent 
reaction field and novel nonpolar hydration free energy estimators. Journal of Computational Chemistry, 23(5):517-29, 2002. 

[32] W. Geng, S. Yu, and G. W. Wei. Treatment of charge singularities in implicit solvent models. Journal of Chemical Physics, 
127:114106,2007. 

[33] M. K. Gilson, M. E. Davis, B. A. Luty, and J. A. McCammon. Computation of electrostatic forces on solvated molecules using the 
Poisson-Boltzmann equation. Journal of Physical Chemistry, 97(14):3591-3600, 1993. 

[34] J. Grant and B. Pickup. A gaussian description of molecular shape. Journal of Physical Chemistry, 99:3503-3510, 1995. 

[35] J. A. Grant, B. T. Pickup, and A. Nicholls. A smooth permittivity function for Poisson-Boltzmann solvation methods. Journal of 
Computational Chemistry, 22(6):608-640, 2001. 

[36] J. A. Grant, B. T. Pickup, M. T. Sykes, C. A. Kitchen, and A. Nicholls. The Gaussian Generalized Born model: application to small 
molecules. Physical Chemistry Chemical Physics, 9:4913-22, 2007. 

[37] M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd, and H. Kimura, editors, 
Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pages 95-110. Springer-Verlag 
Limited, 2008. http://stanford.edu/~boyd/graph_dcp.html. 

[38] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, Mar. 2014. 

[39] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: Data mining, inference, and prediction. In The 
Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer, 2009. 

[40] C. Holm, P. Kekicheff, and R. Podgornik. Electrostatic effects in soft matter and biophysics; NATO Science Series. Kluwer Academic 
Publishers, Boston, 2001. 

[41] B. Honig and A. Nicholls. Classical electrostatics in biology and chemistry. Science, 268(5214):1144-9, 1995. 

[42] R. M. Jackson and M. J. Sternberg. A continuum model for protein-protein interactions: Application to the docking problem. Journal 
of Molecular Biology, 250(2):258-275, 1995. 

[43] R. Jinnouchi and A. B. Anderson. Electronic structure calculations of liquid-solid interfaces: Combination of density functional theory 
and modified Poisson-Boltzmann theory. PHYSICAL REVIEW B, 77:245417, 2008. 

[44] P. Koehl. Electrostatics calculations: latest methodological advances. Current Opinion in Structural Biology, 16(2):142—51,2006. 

[45] L. A. Kuhn, M. A. Siani, M. E. Pique, C. L. Fisher, E. D. Getzoff, and J. A. Tainer. The interdependence of protein surface topography 
and bound water molecules revealed by surface accessibility and fractal density measures. Journal of Molecular Biology, 228(1):13- 
22,1992. 

[46] G. Lamm. The Poisson-Boltzmann equation. In K. B. Lipkowitz, R. Larter, and T. R. Cundari, editors, Reviews in Computational 
Chemistry, pages 147-366. John Wiley and Sons, Inc., Hoboken, N.J., 2003. 

[47] B. Lee and F. M. Richards. The interpretation of protein structures: estimation of static accessibility. J Mol Biol, 55(3):379-400,1971. 

[48] V. J. Licata and N. M. Allewell. Functionally linked hydration changes in escherichia coli aspartate transcarbamylase and its catalytic 
subunit. Biochemistry, 36(33):10161—10167, 1997. 

[49] J. R. Livingstone, R. S. Spolar, and M. T. Record Jr. Contribution to the thermodynamics of protein folding from the reduction in 
water-accessible nonpolar surface area. Biochemistry, 30(17):4237-44, 1991. 



18 


[50] D. L. Mobley and J. P. Guthrie. Freesolv: a database of experimental and calculated hydration free energies, with input files. Journal 
of Computer-Aided Molecular Design, 28:711-720, 2014. 

[51] J. Mongan, C. Simmerling, J. A. McCammon, D. A. Case, and A. Onufriev. Generalized Born model with a simple, robust molecular 
volume correction. Journal of Chemical Theory and Computation, 3(1 ):159—69, 2007. 

[52] R. R. Netz and H. Orland. Beyond Poisson-Boltzmann: Fluctuation effects and correlation functions. European Physical Journal E, 
1(2-3):203-14, 2000. 

[53] A. Nicholls, D. L. Mobley, P. J. Guthrie, J. D. Chodera, and V. S. Pande. Predicting small-molecule solvation free energies: An informal 
blind test for computational chemistry. Journal of Medicinal Chemistry, 51 (4):769-79, 2008. 

[54] M. Nina, W. Im, and B. Roux. Optimized atomic radii for protein continuum electrostatics solvation forces. Biophysical Chemistry, 
78(1 -2):89-96, 1999. 

[55] A. Onufriev, D. Bashford, and D. A. Case. Modification of the generalized Born model suitable for macromolecules. Journal of 
Physical Chemistry B, 104(15):3712-3720, 2000. 

[56] A. Onufriev, D. A. Case, and D. Bashford. Effective Born radii in the generalized Born approximation: the importance of being perfect. 
Journal of Computational Chemistry, 23(14):1297-304, 2002. 

[57] E. L. Ratkova, G. N. Chuev, V. P. Sergiievskyi, and M. V. Fedorov. An accurate prediction of hydration free energies by combination 
of molecular integral equations theory with structural descriptors. J. Phys. Chem. B, 114(37):12068-2079, 2010. 

[58] F. M. Richards. Areas, volumes, packing, and protein structure. Annual Review of Biophysics and Bioengineering, 6(1 ):151—176, 
1977. 

[59] B. Roux and T. Simonson. Implicit solvent models. Biophysical Chemistry, 78(1-2):1-20, 1999. 

[60] M. F. Sanner, A. J. Olson, and J. C. Spehner. Reduced surface: An efficient way to compute molecular surfaces. Biopolymers, 
38:305-320,1996. 

[61] K. A. Sharp and B. Honig. Calculating total electrostatic energies with the nonlinear Poisson-Boltzmann equation. Journal of Physical 
Chemistry, 94:7684-7692,1990. 

[62] K. A. Sharp and B. Honig. Electrostatic interactions in macromolecules - theory and applications. Annual Review of Biophysics and 
Biophysical Chemistry , 19:301-332, 1990. 

[63] R. S. Spolar, J. H. Ha, and M. T. Record Jr. Hydrophobic effect in protein folding and other noncovalent processes involving proteins. 
Proceedings of the National Academy of Sciences of the United States of America, 86(21 ):8382-8385, 1989. 

[64] J. M. J. Swanson, J. Mongan, and J. A. McCammon. Limitations of atom-centered dielectric functions in implicit solvent models. 
Journal of Physical Chemistry B, 109(31 ):14769-72, 2005. 

[65] D. Thomas, J. Chun, Z. Chen, G. W. Wei, and N. A. Baker. Parameterization of a geometric flow implicit solvation model. J. Comput. 
Chem., 24:687-695, 2013. 

[66] H. Tjong and H. X. Zhou. GBr6NL: A generalized Born method for accurately reproducing solvation energy of the nonlinear Poisson- 
Boltzmann equation. Journal of Chemical Physics, 126:195102, 2007. 

[67] J. Tomasi, B. Mennucci, and R. Cammi. Quantum mechanical continuum solvation models. Chem. Rev., 105:2999-3093, 2005. 

[68] V. Tsui and D. A. Case. Molecular dynamics simulations of nucleic acids with a generalized Born solvation model. Journal of the 
American Chemical Society, 122(11):2489-2498, 2000. 

[69] J. A. Wagoner and N. A. Baker. Assessing implicit models for nonpolar mean solvation forces: the importance of dispersion and 
volume terms. Proceedings of the National Academy of Sciences of the United States of America, 103(22) :8331-6, 2006. 

[70] G. W. Wei. Generalized Perona-Malik equation for image restoration. IEEE Signal Processing Lett., 6:165-167, 1999. 

[71] G. W. Wei. Differential geometry based multiscale models. Bulletin of Mathematical Biology, 72:1562 - 1622, 2010. 

[72] G.-W. Wei. Multiscale, multiphysics and multidomain models I: Basic theory. Journal of Theoretical and Computational Chemistry, 
12(8):1341006, 2013. 

[73] G. W. Wei, Y. H. Sun, Y. C. Zhou, and M. Feig. Molecular multiresolution surfaces. arXiv:math-ph/051 lOOIvl, pages 1 - 11,2005. 

[74] G.-W. Wei, Q. Zheng, Z. Chen, and K. Xia. Variational multiscale models for charge transport. SIAM Review, 54(4):699 - 754, 2012. 

[75] T. J. Willmore. Riemannian Geometry. Oxford University Press, USA, 1997. 

[76] S. N. Yu, W. H. Genq, and G. W. Wei. Treatment of qeometric singularities in implicit solvent models. Journal of Chemical Physics, 
126:244108,2007. 

[77] S. N. Yu and G. W. Wei. Three-dimensional matched interface and boundary (MIB) method for treating geometric singularities. J. 
Comput. Phys., 227:602-632, 2007. 

[78] S. N. Yu, Y. C. Zhou, and G. W. Wei. Matched interface and boundary (MIB) method for elliptic problems with sharp-edged interfaces. 
J. Comput. Phys., 224(2)729-756, 2007. 

[79] Z. Y. Yu and C. Bajaj. Computational approaches for automatic structural analysis of large biomolecular complexes. IEEE/ACM Trans 
Comput Biol Bioinform, 5:568-582, 2008. 

[80] S. Zhao. Pseudo-time-coupled nonlinear models for biomolecular surface representation and solvation analysis. International Journal 
for Numerical Methods in Biomedical Engineering, 27:1964-1981,2011. 




19 


[81] S. Zhao. Operator splitting adi schemes for pseudo-time coupled nonlinear solvation simulations. Journal of Computational Physics, 
257:1000- 1021,2014. 

[82] S. Zhao and G. W. Wei. High-order FDTD methods via derivative matching for Maxwell’s equations with material interfaces. J. 
Comput. Phys., 200(1 ):60-103, 2004. 

[83] S. G. Zhou, L. T. Cheng, H. Sun, J. W. Che, J. Dzubiella, B. Li, and J. A. McCammon. Ls-vism: A software package for analysis of 
biomolecular solvation. Journal of Computational Chemistry , 36:1047-1059, 2015. 

[84] Y. C. Zhou, M. Feig, and G. W. Wei. Highly accurate biomolecular electrostatics in continuum dielectric environments. Journal of 
Computational Chemistry, 29:87-97, 2008. 

[85] Y. C. Zhou and G. W. Wei. On the fictitious-domain and interpolation formulations of the matched interface and boundary (MIB) 
method. J. Comput. Phys., 219(1):228-246, 2006. 

[86] Y. C. Zhou, S. Zhao, M. Feig, and G. W. Wei. High order matched interface and boundary method for elliptic equations with 
discontinuous coefficients and singular sources. J. Comput. Phys., 213(1 ):1—30, 2006. 

[87] J. Zhu, E. Alexov, and B. Honig. Comparative study of generalized Born models: Born radii and peptide folding. Journal of Physical 
Chemistry B, 109(7):3008-22, 2005. 



